connection: propagate I/O errors to network filters.#13941
connection: propagate I/O errors to network filters.#13941PiotrSikora wants to merge 9 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
PiotrSikora
left a comment
There was a problem hiding this comment.
This is my attempt at a fix for #13939. It's still work-in-progress, so errors are caught only in the raw_buffer_socket and there are no new tests, but I'd like to get feedback on the general direction.
| stream_info_.setUpstreamTransportFailureReason(kUpstreamConnectionTerminationDetails); | ||
| stream_info_.setConnectionTerminationDetails(kUpstreamConnectionTerminationDetails); | ||
| stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination); | ||
| } |
There was a problem hiding this comment.
Those values are set on the upstream connection's stream info, and they are never propagated to downstream connection's stream info or access logs. I'm not sure what's the best way to solve that.
There was a problem hiding this comment.
Could the high level request object query the connection as part of the termination process to determine if termination was graceful or abrupt?
ggreenway
left a comment
There was a problem hiding this comment.
I'm in favor of distinguishing between closing due to error, or graceful.
For the transport socket signaling, one idea is to have another PostIoAction. But I'm not entirely sure that will make the interface any cleaner.
Should this change also include a new ConnectionEvent to send to the network filters?
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Something like the updated commit?
The event is propagated to network filters via Right now, there is no way to dispatch |
|
(Note: a few tests are failing because unmocked expectations for set response flags, etc., so please ignore them for now) |
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
antoniovicente
left a comment
There was a problem hiding this comment.
Very interesting PR. I guess my feedback boils down to: Does it make sense to have both graceful and error close PostIoActions? A majority of the Close PostIoActions are clearly errors. The rest may also be errors.
|
|
||
| end_stream_read_ = result.end_stream_read_; | ||
| read_error_ = result.action_ == Network::PostIoAction::Close; | ||
| read_error_ = result.action_ == Network::PostIoAction::CloseError; |
There was a problem hiding this comment.
Should this be:
result.action_ != Network::PostIoAction::KeepOpen;
or consider renaming read_error_ to got_close_error_
I guess one thing to keep in mind is that as of this PR RawBufferSocket can only return KeepOpen or CloseError from doRead. Should we consider all Close PostIoActions as errors?
| KeepOpen | ||
| KeepOpen, | ||
| // Close the connection because of an error. | ||
| CloseError, |
There was a problem hiding this comment.
It seems that a majority of Close cases are being renamed CloseError.
Should we also rename Close to CloseGraceful or similar as a way to improve our chances of catching all cases that need to be adjusted?
| stream_info_.setUpstreamTransportFailureReason(kUpstreamConnectionTerminationDetails); | ||
| stream_info_.setConnectionTerminationDetails(kUpstreamConnectionTerminationDetails); | ||
| stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination); | ||
| } |
There was a problem hiding this comment.
Could the high level request object query the connection as part of the termination process to determine if termination was graceful or abrupt?
| } else { | ||
| stream_info_.setUpstreamTransportFailureReason(kUpstreamConnectionTerminationDetails); | ||
| stream_info_.setConnectionTerminationDetails(kUpstreamConnectionTerminationDetails); | ||
| stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination); |
There was a problem hiding this comment.
nit: delegate stats update to virtual method so you can avoid the dynamic cast.
| stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination); | ||
| } | ||
| // Force "end_stream" so that filters can process this error. | ||
| result.end_stream_read_ = true; |
There was a problem hiding this comment.
I usually associate end_stream_read_ with graceful termination. Is it appropriate to set this? Do we need alternate signaling to notify filters about abrupt terminations via either additional arguments to push pipeline or a new virtual method which is called on abrupt termination.
There was a problem hiding this comment.
I agree; I think of end_stream as always a graceful operation. I think a different signal for error is more appropriate.
There was a problem hiding this comment.
The issue is that there is no different signal right now, and if we don't set this, then network filters won't be executed at all.
Note that this PR is meant to be a temporary fix until we have a proper solution for #13940, but that's going to require a lot more changes and time.
There was a problem hiding this comment.
This seems like a massive hack. How much effort would it be to implement the real fix?
There was a problem hiding this comment.
That depends on what do you consider a real fix. I'm not too familiar with that part of the codebase, but if we want to propagate upstream connection events to network filters, then I imagine it's quite a lot of changes, so I think it would be better if one of maintainers could work on this.
There was a problem hiding this comment.
I don't know what to suggest. It would be good to figure out how to do a catch-all cleanup of WASM resources on abnormal connection termination. Could WASM detect connection termination based on an L4 filter that does the relevant signaling on destruction?
There was a problem hiding this comment.
FWIW, we have a workaround in #13836 that works for Wasm.
| info_->state() != Ssl::SocketState::ShutdownSent) { | ||
| PostIoAction action = doHandshake(); | ||
| if (action == PostIoAction::Close || info_->state() != Ssl::SocketState::HandshakeComplete) { | ||
| if (action != PostIoAction::KeepOpen || info_->state() != Ssl::SocketState::HandshakeComplete) { |
There was a problem hiding this comment.
Also change post actions in NotReadySslSocket to CloseError
| ENVOY_CONN_LOG(debug, "TSI: Handshake failed: end of stream without enough data", | ||
| callbacks_->connection()); | ||
| return Network::PostIoAction::Close; | ||
| return Network::PostIoAction::CloseError; |
There was a problem hiding this comment.
There are 2 remaining uses of Network::PostIoAction::Close; in this file when finding a read 0 during handshakes. Would it make sense to also consider those cases CloseError?
There was a problem hiding this comment.
I changed one to CloseError and one to CloseGraceful (when the certificate validation failed, since that's not and an I/O error).
There was a problem hiding this comment.
I guess the question is wherever or it is worth having the extra complexity that comes from having 2 close types when out of the 3 remaining uses 2 of them are handshake errors. I guess it's fine to use CloseGraceful for the 3 remaining cases.
| default: | ||
| handshake_callbacks_->onFailure(); | ||
| return PostIoAction::Close; | ||
| return PostIoAction::CloseError; |
There was a problem hiding this comment.
Does the PostIoAction::Close on line 233 above a graceful or error close?
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
| server_connection->write(data, false); | ||
| EXPECT_EQ(data.length(), 0); | ||
| // Calling close(FlushWrite) in onEvent() callback results in PostIoAction::Close, | ||
| // Calling close(FlushWrite) in onEvent() callback results in PostIoAction::CloseGraceful, |
There was a problem hiding this comment.
Seeing some failures in //test/extensions/transport_sockets/tls:ssl_socket_test, examples:
2020-11-14T01:05:04.6013231Z Uninteresting mock function call - taking default action specified at:
2020-11-14T01:05:04.6013980Z test/mocks/stream_info/mocks.cc:30:
2020-11-14T01:05:04.6014503Z Function call: setConnectionTerminationDetails("downstream connection was terminated")
2020-11-14T01:05:04.6024240Z unknown file: Failure
2020-11-14T01:05:04.6025476Z Uninteresting mock function call - taking default action specified at:
2020-11-14T01:05:04.6026043Z test/mocks/stream_info/mocks.cc:24:
2020-11-14T01:05:04.6026487Z Function call: setResponseFlag(16384)
There was a problem hiding this comment.
Yeah, see my previous comment - I didn't add mocks for them, since it's a lot of extra noise in PR for code that was still being discussed. Also, I'm not sure if all 3 values (connection termination details, upstream transport socket failure and response flags) should be set.
| stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination); | ||
| } | ||
| // Force "end_stream" so that filters can process this error. | ||
| result.end_stream_read_ = true; |
There was a problem hiding this comment.
I don't know what to suggest. It would be good to figure out how to do a catch-all cleanup of WASM resources on abnormal connection termination. Could WASM detect connection termination based on an L4 filter that does the relevant signaling on destruction?
| ENVOY_CONN_LOG(debug, "TSI: Handshake failed: end of stream without enough data", | ||
| callbacks_->connection()); | ||
| return Network::PostIoAction::Close; | ||
| return Network::PostIoAction::CloseError; |
There was a problem hiding this comment.
I guess the question is wherever or it is worth having the extra complexity that comes from having 2 close types when out of the 3 remaining uses 2 of them are handshake errors. I guess it's fine to use CloseGraceful for the 3 remaining cases.
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Piotr Sikora piotrsikora@google.com